## Understanding backtests {: #understanding-backtests }

Backtesting is conceptually the same as cross-validation in that it provides the ability to test a predictive model using existing historical data. That is, you can evaluate how the model would have performed historically to estimate how the model will perform in the future. Unlike cross-validation, however, backtests allow you to select specific time periods or durations for your testing instead of random rows, creating in-sequence, instead of randomly sampled, “trials” for your data. So, instead of saying “break my data into 5 folds of 1000 random rows each,” with backtests you say “simulate training on 1000 rows, predicting on the <em>next</em> 10. Do that 5 times.” Backtests simulate training the model on an older period of training data, then measure performance on a newer period of validation data. After models are built, through the Leaderboard you can [change the training](#change-the-training-period) range and sampling rate. DataRobot then retrains the models on the shifted training data.

If the goal of your project is to predict forward in time, backtesting gives you a better understanding of model performance (on a time-based problem) than cross-validation. For time series problems, this equates to more confidence in your predictions. Backtesting confirms model robustness by allowing you to see whether a model consistently outperforms other models across all folds.

The number of backtests that DataRobot defaults to is dependent on the project parameters, but you can configure the build to include up to 20 backtests for additional model accuracy. Additional backtests provide you with more trials of your model so that you can be more sure about your estimates. You can carefully configure the duration and dates so that you can, for example, generate “10 two-month predictions.” Once configured to avoid specific periods, you can ask “Are the predictions similar?” or for two similar months, “Are the errors the same?”

Large gaps in your data can make backtesting difficult. If your dataset has long periods of time without any observed data, it is prudent to review where these gaps fall in your backtests. For example, if a validation window has too few data points, choosing a longer data validation window will ensure more reliable validation scores. While using more backtests may give you a more reliable measure of model performance, it also decreases the maximum training window available to the earliest backtest fold.

##  Understanding gaps {: #understanding-gaps }

Configuring gaps allows you to reproduce time gaps usually observed between model training and model deployment (a period for which data is not to be used for training). It is useful in cases where, for example:

* Only older data is available for training (because [ground truth](https://en.wikipedia.org/wiki/Ground_truth#Statistics_and_machine_learning){ target=_blank } is difficult to collect).
* When a model’s validation and subsequent deployment takes weeks or months.
* To deliver predictions in advance for review or actions.

A simple example: in insurance, it can take roughly a year for a claim to "develop" (the time between filing and determining the claim payout). For this reason, an actuary is likely to price 2017 policies based on models trained with 2015 data. To replicate this practice, you can insert a one-year gap between the training set and the validation set. This ensures that model evaluation is more correct. Other examples include when pricing needs regulator approval, retail sales for a seasonal business, and pricing estimates that rely on delayed reporting.
